from IPython.display import Image
from IPython.core.display import HTML
Image(url= "https://www.apaservices.org/images/tile-practice-anxiety-depression-covid-19_tcm9-271978.jpg",
width = 950, height = 600)
Data set name: Indicators of Anxiety or Depression Based on Reported Frequency of Symptoms During Last 7 Days
Key features that will be included in the analysis:
Data collection began on April 23, 2020 and the data set was recently updated on Nov. 29, 2022. The most recent data set is available here.
%matplotlib inline
import pandas as pd
import numpy as np
import plotly.express as px
px.defaults.template = "ggplot2"
px.defaults.color_continuous_scale = px.colors.sequential.Blackbody
px.defaults.width = 750
px.defaults.height = 500
import plotly.graph_objects as go
import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv(
'https://data.cdc.gov/api/views/8pt5-q6wp/rows.csv?accessType=DOWNLOAD')
Why?
Mental health during the COVID-19 is an important topic. Here I would like to visualize the effects of COVID-19 pandemic on anxiety and depression in different groups of American residents. The information may be of interest to policy makers, health care providers, scientific researchers, and the public in general.
How?
What?
Users can see the COVID-19 effects in different social groups based on:
Where?
The task is based on data sets that focus on the American households.
When?
Users can use the figures when they have a specific question in mind or just explore to gain some insights.
Who?
Anyone who is interested in this topic may find some useful information.
# let's take a quick look
print(df.shape)
df.head()
(11718, 14)
| Indicator | Group | State | Subgroup | Phase | Time Period | Time Period Label | Time Period Start Date | Time Period End Date | Value | Low CI | High CI | Confidence Interval | Quartile Range | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Symptoms of Depressive Disorder | National Estimate | United States | United States | 1 | 1 | Apr 23 - May 5, 2020 | 04/23/2020 | 05/05/2020 | 23.5 | 22.7 | 24.3 | 22.7 - 24.3 | NaN |
| 1 | Symptoms of Depressive Disorder | By Age | United States | 18 - 29 years | 1 | 1 | Apr 23 - May 5, 2020 | 04/23/2020 | 05/05/2020 | 32.7 | 30.2 | 35.2 | 30.2 - 35.2 | NaN |
| 2 | Symptoms of Depressive Disorder | By Age | United States | 30 - 39 years | 1 | 1 | Apr 23 - May 5, 2020 | 04/23/2020 | 05/05/2020 | 25.7 | 24.1 | 27.3 | 24.1 - 27.3 | NaN |
| 3 | Symptoms of Depressive Disorder | By Age | United States | 40 - 49 years | 1 | 1 | Apr 23 - May 5, 2020 | 04/23/2020 | 05/05/2020 | 24.8 | 23.3 | 26.2 | 23.3 - 26.2 | NaN |
| 4 | Symptoms of Depressive Disorder | By Age | United States | 50 - 59 years | 1 | 1 | Apr 23 - May 5, 2020 | 04/23/2020 | 05/05/2020 | 23.2 | 21.5 | 25.0 | 21.5 - 25.0 | NaN |
# remove the collumns that we'll not use for further analysis
df = df.drop(['Phase', 'Time Period Label', 'Time Period Start Date', 'Time Period End Date',
'Low CI', 'High CI', 'Confidence Interval', 'Quartile Range'], axis=1)
df.head()
| Indicator | Group | State | Subgroup | Time Period | Value | |
|---|---|---|---|---|---|---|
| 0 | Symptoms of Depressive Disorder | National Estimate | United States | United States | 1 | 23.5 |
| 1 | Symptoms of Depressive Disorder | By Age | United States | 18 - 29 years | 1 | 32.7 |
| 2 | Symptoms of Depressive Disorder | By Age | United States | 30 - 39 years | 1 | 25.7 |
| 3 | Symptoms of Depressive Disorder | By Age | United States | 40 - 49 years | 1 | 24.8 |
| 4 | Symptoms of Depressive Disorder | By Age | United States | 50 - 59 years | 1 | 23.2 |
# let check if the data set has any null values
df.isna().sum()
Indicator 0 Group 0 State 0 Subgroup 0 Time Period 0 Value 540 dtype: int64
# remove the rows with null values
df = df[df['Value'].notna()]
# double check to make sure the null values have been removed
df.isna().sum()
Indicator 0 Group 0 State 0 Subgroup 0 Time Period 0 Value 0 dtype: int64
# check if the data set has any duplicates
df.duplicated().sum()
0
The Value column shows the percentage of adults who report symptoms of anxiety or depression that have been shown to be associated with diagnoses of generalized anxiety disorder or major depressive disorder. To avoid confusions, we change the column name to Psychological Distress Level.
df = df.rename({'Value': 'Psychological Distress Level'}, axis='columns')
df.head()
| Indicator | Group | State | Subgroup | Time Period | Psychological Distress Level | |
|---|---|---|---|---|---|---|
| 0 | Symptoms of Depressive Disorder | National Estimate | United States | United States | 1 | 23.5 |
| 1 | Symptoms of Depressive Disorder | By Age | United States | 18 - 29 years | 1 | 32.7 |
| 2 | Symptoms of Depressive Disorder | By Age | United States | 30 - 39 years | 1 | 25.7 |
| 3 | Symptoms of Depressive Disorder | By Age | United States | 40 - 49 years | 1 | 24.8 |
| 4 | Symptoms of Depressive Disorder | By Age | United States | 50 - 59 years | 1 | 23.2 |
To avoid overcrowding in the following analysis, we replace the values in column Indicator with a shorter version.
df['Indicator'] = df['Indicator'].replace(['Symptoms of Depressive Disorder',
'Symptoms of Anxiety Disorder',
'Symptoms of Anxiety Disorder or Depressive Disorder'],
['Depression', 'Anxiety', 'Depression/Anxiety'])
We can see that the data set is grouped by different factors:
df["Group"].unique().tolist()
['National Estimate', 'By Age', 'By Sex', 'By Race/Hispanic ethnicity', 'By Education', 'By State', 'By Disability status', 'By Gender identity', 'By Sexual orientation']
Based on these factors, we could divide the full data set into multiple subsets for further analysis.
national_est = df[df["Group"] == "National Estimate"]
age = df[df["Group"] == "By Age"]
sex = df[df["Group"] == "By Sex"]
race = df[df["Group"] == "By Race/Hispanic ethnicity"]
edu = df[df["Group"] == "By Education"]
state = df[df["Group"] == "By State"]
disa_sta = df[df["Group"] == "By Disability status"]
gender_ide = df[df["Group"] == "By Gender identity"]
sex_ori = df[df["Group"] == "By Sexual orientation"]
# let's take the 'distability status' subset as an example and take a quick look
disa_sta.head()
| Indicator | Group | State | Subgroup | Time Period | Psychological Distress Level | |
|---|---|---|---|---|---|---|
| 5822 | Depression | By Disability status | United States | With disability | 28 | 52.9 |
| 5823 | Depression | By Disability status | United States | Without disability | 28 | 18.5 |
| 5913 | Anxiety | By Disability status | United States | With disability | 28 | 56.3 |
| 5914 | Anxiety | By Disability status | United States | Without disability | 28 | 23.0 |
| 6004 | Depression/Anxiety | By Disability status | United States | With disability | 28 | 64.3 |
Great! Now we are ready to perform the tasks.
The survey collected data from American residents who report symptoms of anxiety and/or depression. First we would like to know the distributions of these symptoms.
fig = px.pie(national_est, values='Psychological Distress Level', names='Indicator', hole = 0.4,
title = 'Distribution of Symptoms')
fig.update_layout(title_x=0.5, legend=dict(y=0.99, x=0.7))
fig.show()
The data set contains survey data of 51 time periods of two weeks each (April 2020 - November, 2022). We can see that the level of different symptoms have shown similar patterns and fluctuated since COVID-19 pandemic began.
fig = px.line(national_est, x='Time Period', y="Psychological Distress Level", color ='Indicator',
title = 'Fluctuation of Psychological Distress')
fig.show()
The main goal is to see whether covid pandemic increased people's anxiety/depression differently in different groups.
First, we focus on age data and plot a bar chart. The vertical stacked bar represents the total anxiety/depression level in a given group.
fig = px.bar(age, y='Psychological Distress Level', x="Subgroup", color='Indicator', orientation='v',
hover_data=["Psychological Distress Level"],
title='Psychological Distress Level in Different Age Groups')
fig.update_layout(title_x=0.5, yaxis_title="Psychological Distress Level", xaxis_title="Age Groups")
fig.show()
Experiences of anxiety and/or depression are especially widespread among young adults, and the symptoms levels appear to drop as the age increases. Next we will check how symptom levels correlate with race groups.
fig = px.bar(race, x='Psychological Distress Level', y="Subgroup", color='Indicator', orientation='h',
hover_data=["Psychological Distress Level"],
title='Psychological Distress Level in Different Race Groups')
fig.update_layout(title_x=0.5, yaxis_title="Race")
fig.show()
It's clear that the negative effects of the pandemic have hit Black and Hispanic adults harder than other race groups.
fig = px.bar(edu, y='Psychological Distress Level', x="Subgroup", color='Indicator', orientation='v',
hover_data=["Psychological Distress Level"],
title='Psychological Distress Level in Different Education Groups')
fig.update_layout(title_x=0.5, yaxis_title="Education")
fig.show()
In terms of education level, it seems that people with the highest education reported least symptoms, and people with poor education experienced symptoms more densely than other subgroups.
Next we look at the effects in groups with and without disabilities.
fig = px.bar(disa_sta, x='Psychological Distress Level', y="Subgroup", color='Indicator', orientation='h',
hover_data=["Psychological Distress Level"],
title='Psychological Distress Level in Groups with Different Disability Status')
fig.update_layout(title_x=0.5, yaxis_title="Disability Status")
fig.show()
Next, we would like to compare people with different gender, gender identity, or sexual orientation. As shown below, the sexual and gender minorities have worse mental health compared to the majorities.
fig = px.bar(sex, x='Psychological Distress Level', y="Subgroup", color='Indicator', orientation='h',
hover_data=["Psychological Distress Level"],
title='Psychological Distress Level in Different Sex Groups')
fig.update_layout(title_x=0.5, yaxis_title="Sex")
fig.show()
fig = px.bar(gender_ide, x='Psychological Distress Level', y="Subgroup", color='Indicator', orientation='h',
hover_data=["Psychological Distress Level"],
title='Psychological Distress Level & Gender Identity')
fig.update_layout(title_x=0.5, yaxis_title="Gender identity")
fig.show()
fig = px.bar(sex_ori, x='Psychological Distress Level', y="Subgroup", color='Indicator', orientation='h',
hover_data=["Psychological Distress Level"],
title='Psychological Distress Level & Sexual Orientation')
fig.update_layout(title_x=0.5, yaxis_title="Sexual Orientation")
fig.show()
Most people who view the charts above would agree that the pandemic has exacerbated existing health inequalities! The negative effects have hit harder on social groups who already face disadvantage and discrimination, particularly in people who have Black and Hispanic ethnic backgrounds, and who are sexual and gender minorities.
Finally, we will look at different states in America. Here we will compare the average Psychological Distress Level between different states.
# let's calculate the mean of Psychological Distress Level in each state
state_agg = state[["Subgroup", "Psychological Distress Level"]]
state_mean = state_agg.groupby(by="Subgroup").mean().round()
state_mean['Psychological Distress Level']
Subgroup Alabama 31.0 Alaska 30.0 Arizona 31.0 Arkansas 33.0 California 32.0 Colorado 30.0 Connecticut 28.0 Delaware 27.0 District of Columbia 29.0 Florida 31.0 Georgia 30.0 Hawaii 27.0 Idaho 28.0 Illinois 29.0 Indiana 30.0 Iowa 27.0 Kansas 28.0 Kentucky 32.0 Louisiana 35.0 Maine 28.0 Maryland 28.0 Massachusetts 28.0 Michigan 29.0 Minnesota 25.0 Mississippi 34.0 Missouri 30.0 Montana 27.0 Nebraska 26.0 Nevada 33.0 New Hampshire 27.0 New Jersey 29.0 New Mexico 33.0 New York 29.0 North Carolina 29.0 North Dakota 25.0 Ohio 29.0 Oklahoma 33.0 Oregon 32.0 Pennsylvania 29.0 Rhode Island 29.0 South Carolina 29.0 South Dakota 24.0 Tennessee 31.0 Texas 32.0 Utah 29.0 Vermont 28.0 Virginia 28.0 Washington 30.0 West Virginia 33.0 Wisconsin 25.0 Wyoming 27.0 Name: Psychological Distress Level, dtype: float64
%%capture --no-display
# The dictionary below was created by @JeffPaine (Github)
state_abbrev = {
'Alabama': 'AL',
'Alaska': 'AK',
'American Samoa': 'AS',
'Arizona': 'AZ',
'Arkansas': 'AR',
'California': 'CA',
'Colorado': 'CO',
'Connecticut': 'CT',
'Delaware': 'DE',
'District of Columbia': 'DC',
'Florida': 'FL',
'Georgia': 'GA',
'Guam': 'GU',
'Hawaii': 'HI',
'Idaho': 'ID',
'Illinois': 'IL',
'Indiana': 'IN',
'Iowa': 'IA',
'Kansas': 'KS',
'Kentucky': 'KY',
'Louisiana': 'LA',
'Maine': 'ME',
'Maryland': 'MD',
'Massachusetts': 'MA',
'Michigan': 'MI',
'Minnesota': 'MN',
'Mississippi': 'MS',
'Missouri': 'MO',
'Montana': 'MT',
'Nebraska': 'NE',
'Nevada': 'NV',
'New Hampshire': 'NH',
'New Jersey': 'NJ',
'New Mexico': 'NM',
'New York': 'NY',
'North Carolina': 'NC',
'North Dakota': 'ND',
'Northern Mariana Islands':'MP',
'Ohio': 'OH',
'Oklahoma': 'OK',
'Oregon': 'OR',
'Pennsylvania': 'PA',
'Puerto Rico': 'PR',
'Rhode Island': 'RI',
'South Carolina': 'SC',
'South Dakota': 'SD',
'Tennessee': 'TN',
'Texas': 'TX',
'Utah': 'UT',
'Vermont': 'VT',
'Virgin Islands': 'VI',
'Virginia': 'VA',
'Washington': 'WA',
'West Virginia': 'WV',
'Wisconsin': 'WI',
'Wyoming': 'WY'
}
state["Subgroup"] = state["Subgroup"].map(state_abbrev)
state_agg = state[["Subgroup", "Psychological Distress Level"]]
state_mean = state_agg.groupby(by="Subgroup").mean()
fig = px.bar(state_mean, y='Psychological Distress Level', x=state_mean.index, text_auto='.2s',
title="Psychological Distress Level across States")
fig.update_layout(title_x=0.5)
fig.show()
fig = go.Figure(data=go.Choropleth(
locations=state_mean.index,
z = state_mean['Psychological Distress Level'].round(),
locationmode = 'USA-states',
colorscale = 'Reds',
colorbar_title = "% of respondents",
))
fig.update_layout(
title_text = 'Psychological Distress Level across States',
title_x=0.5,
geo_scope='usa',
)
fig.show()
In the interactive plot above, we can mouse-over a state to quickly see its Psychological Distress level. With the heat map, we can see a pattern that states in different regions are quite different.
Here we divide all the states into four statistical regions based on the Census Bureau region definition, WEST, MIDWEST, NORTHEAST, and SOUTH, which is commonly used for data collection and analysis.
# we use this as a reference
Image(url= "https://upload.wikimedia.org/wikipedia/commons/thumb/f/f1/Census_Regions_and_Division_of_the_United_States.svg/800px-Census_Regions_and_Division_of_the_United_States.svg.png")
def state_to_region(state):
'''
This is a helper function to convert a state to one of the census region.
'''
northeast = ['New Jersey', 'Pensylvania', 'New York', 'Connecticut',
'Rhode Island', 'Massachusetts', 'Vermont', 'New Hampshire', 'Maine']
midwest = ['North Dakota', 'South Dakota', 'Nebraska', 'Kansas', 'Missouri', 'Iowa',
'Minnesota', 'Wisconsin', 'Illinois', 'Michigan', 'Indiana', 'Ohio']
south = ['Delaware', 'Maryland', 'District of Columbia', 'Virginia', 'West Virginia', 'Kentucky',
'North Carolina', 'South Carolina', 'Tennessee', 'Arkansas', 'Louisiana', 'Mississippi',
'Alabama', 'Georgia', 'Florida','Texas', 'Oklahoma']
west = ['Arizona','Washington', 'Oregon', 'California', 'Hawaii', 'Alaska', 'Nevada', 'Idaho',
'Montana', 'Utah', 'Colorado', 'Wyoming', 'New Mexico']
if state in northeast:
region = 'Northeast'
elif state in midwest:
region = 'Midwest'
elif state in south:
region = 'South'
else:
region = 'West'
return region
# add a new group 'Region' and convert a state to one of the four regions.
state['Region'] = state['State'].apply(state_to_region)
To look at the state data more closely, the interactive chart below is pretty useful with intuitive navigation. Clicking the play button will automatically show the data from time periods 1 to 51 (i.e., from April, 2020 to November, 2022). If needed, we can stop at any point by clicking the pause button, or we could specifically select a time period to look at the relevant data. The point with different colors helps users to identify people with different symptoms. If we hover over a point, it will show the detailed information, such as state name, symptoms, time period, region, and Psychological Distress level.
fig = px.scatter(state, x="Time Period", y="Psychological Distress Level", animation_frame="Time Period",
animation_group="State", color="Indicator", hover_name="State", facet_col="Region",
log_x=False, size_max=45, range_x=[0,52], range_y=[10,55], title="Psychological distress level in the four census regions during COVID-19")
fig.show()
We have used a data set from the US Census Bureau to visualize the effects of COVID-19 pandemic on psychological distress level in American households. The charts with pie, lines and bars allow us to easily make comparisons. The final interactive chart gives us a chance to explore the effects on different regions across America and answer some questions we may have in mind.
The COVID-19 pandemic has exacerbated existing health inequalities within societies and between different population groups. The negative effects have hit harder on social groups who already face disadvantage and discrimination, particularly in people who have disabilities, who have Black and Hispanic ethnic backgrounds, and who are sexual and gender minorities. Action should be taken swiftly to sufficiently meet the needs of these individuals, helping them to survive the COVID-19 pandemic and be well equipped to endure any similar strains in the future.
https://www.apaservices.org/images/tile-practice-anxiety-depression-covid-19_tcm9-271978.jpg
https://data.cdc.gov/NCHS/Indicators-of-Anxiety-or-Depression-Based-on-Repor/8pt5-q6wp
https://yourfreetemplates.com/us-region-map-template/usa_census_region_map/